ABSTRACT
The advances of single-cell transcriptomic technologies have led to increasing use of single-cell RNA sequencing (scRNA-seq) data in large-scale patient cohort studies. The resulting high-dimensional data can be summarized and incorporated into patient outcome prediction models in several ways; however, there is a pressing need to understand the impact of analytical decisions on such model quality. In this study, we evaluate the impact of analytical choices on model choices, ensemble learning strategies and integrate approaches on patient outcome prediction using five scRNA-seq COVID-19 datasets. First, we examine the difference in performance between using single-view feature space versus multi-view feature space. Next, we survey multiple learning platforms from classical machine learning to modern deep learning methods. Lastly, we compare different integration approaches when combining datasets is necessary. Through benchmarking such analytical combinations, our study highlights the power of ensemble learning, consistency among different learning methods and robustness to dataset normalization when using multiple datasets as the model input.
Subject(s)
Benchmarking , COVID-19 , Humans , Gene Expression Profiling , Machine Learning , Sequence Analysis, RNA/methodsABSTRACT
COVID-19 patients display a wide range of disease severity, ranging from asymptomatic to critical symptoms with high mortality risk. Our ability to understand the interaction of SARS-CoV-2 infected cells within the lung, and of protective or dysfunctional immune responses to the virus, is critical to effectively treat these patients. Currently, our understanding of cell-cell interactions across different disease states, and how such interactions may drive pathogenic outcomes, is incomplete. Here, we developed a generalizable and scalable workflow for identifying cells that are differentially interacting across COVID-19 patients with distinct disease outcomes and use this to examine eight public single-cell RNA-seq datasets (six from peripheral blood mononuclear cells, one from bronchoalveolar lavage and one from nasopharyngeal), with a total of 211 individual samples. By characterizing the cell-cell interaction patterns across epithelial and immune cells in lung tissues for patients with varying disease severity, we illustrate diverse communication patterns across individuals, and discover heterogeneous communication patterns among moderate and severe patients. We further illustrate patterns derived from cell-cell interactions are potential signatures for discriminating between moderate and severe patients. Overall, this workflow can be generalized and scaled to combine multiple scRNA-seq datasets to uncover cell-cell interactions.
Subject(s)
COVID-19 , Cell Communication , Humans , Leukocytes, Mononuclear , SARS-CoV-2 , WorkflowABSTRACT
High-throughput single-cell technologies hold the promise of discovering novel cellular relationships with disease. However, analytical workflows constructed for these technologies to associate cell proportions with disease often employ unsupervised clustering techniques that overlook the valuable hierarchical structures that have been used to define cell types. We present treekoR, a framework that empirically recapitulates these structures, facilitating multiple quantifications and comparisons of cell type proportions. Our results from twelve case studies reinforce the importance of quantifying proportions relative to parent populations in the analyses of cytometry data - as failing to do so can lead to missing important biological insights.
Subject(s)
Flow Cytometry/methods , Phenotype , CD8 Antigens , CD8-Positive T-Lymphocytes , COVID-19 , Cluster Analysis , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Single-Cell Analysis/methodsABSTRACT
COVID-19 is now causing a global pandemic, there is a demand to explain the different clinical patterns between children and adults. To clarify the organs/cell types vulnerable to COVID-19 infection and the potential age-depended expression patterns of five factors (ACE2, TMPRSS2, MTHFD1, CTSL, CTSB) associated with clinical symptoms. In this study, we analyzed expression levels of five COVID-19 host dependency factors in multiple adult and fetal human organs. The results allowed us to grade organs at risk and also pointed towards the target cell types in each organ mentioned above. Based on these results we constructed an organ- and cell type-specific vulnerability map of the expression levels of the five COVID-19 factors in the human body, providing insight into the mechanisms behind the symptoms, including the non-respiratory symptoms of COVID-19 infection and injury. Also, the different expression patterns of the COVID-19 factors well demonstrate an explanation that the different clinical patterns between adult and children/infants.